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Sparse Subspace Clustering (SSC) has achieved state-of-the-art clustering 
quality by performing spectral clustering over a ^^-norm based similarity 
graph. However, SSC is a transductive method which does not handle 
with the data not used to construct the graph (out-of-sample data). For 
each new datum, SSC requires solving n optimization problems in 0(n) 
variables for performing the algorithm over the whole data set, where 
n is the number of data points. Therefore, it is inefficient to apply 
SSC in fast online clustering and scalable graphing. In this letter, we 
propose an inductive spectral clustering algorithm, called inductive Sparse 
Subspace Clustering (iSSC), which makes SSC feasible to cluster out- 
of-sample data. iSSC adopts the assumption that high-dimensional data 
actually lie on the low-dimensional manifold such that out-of-sample data 
could be grouped in the embedding space learned from in-sample data. 
Experimental results show that iSSC is promising in clustering out-of- 
sample data. 

Introduction: Spectral clustering is one of the most popular subspace 
clustering algorithms, which aims to find a cluster membership of data 
C'^oints and the con'esponding low-dimensional representation by utilizing 
^~^:he spectrum of a Laplacian matrix. The entries in the Laplacian matrix 
C _ Mppjpt the similarity among data points. Thus, the construction of similarity 
0^^raph lies on the heart of spectral clustering. In a similarity graph, the 
> yeitex denotes a data point and the connection weight between two points 
C^epresents their similaiity. 

' Recently, Elhamifar and Vidal [T] constructed a similarity graph by 
^-tising £^ -minimization based coefficient and performed spectral clustering 
^vp\'er the graph, named Sparse Subspace Clustering (SSC). It automatically 
selects the nearby points for each datum by utilizing the principle of 
I— sparsity without pre-determination of the size of neighborhood. SSC 
^^las achieved impressive performance in images clustering and motion 
^eginentation. However, it requires solving n optimization problems over 
^ji data points and calculating the eigenvectors of a n x n matrix, resulting 
C^n a very high computational complexity. In general, the time complexity 
SSC is proportion to the cubic of data size. Thus, any medium-sized 
data set will bring up the scalability issues with SSC. In addition, SSC 
| s a transductive algorithm which does not handle with the data not used 
K^to construct the graph (out-of-sample data). For each new datum, SSC 
-Jieeds performing the algorithm over the whole data set, which makes SSC 
1 -inefficient to fast online clustering and scalable grouping. 



To address the scalability issue and the out-of-sample problem in SSC, 
^-Jwe propose an inductive clustering algorithm which is called inductive 
^"^parse Subspace Clustering algorithm (iSSC). Out motivation derives 
COrom a widely-accepted assumption in manifold leaming that the high- 
C '_ M imensinnal data actually lie on the low-dimensional manifold. Therefore, 
COve could obtain the cluster membership of out-of-sample data by assigning 
T— H:hem to the nearest cluster in the embedding space learned from well- 
!L*sampled in-sample data. In other words, we resolve the out-of-sample 
. I^^roblem in SSC by using subspace leaming method. On the other hand, 
►\^or large scale data set, we randomly split it into two paits, in-sample data 
'V^nd out-of-sample data, such that scalability issue could be addressed as 

j^ii. out-of-sample problem. 
" ■ ■ Except in some specified cases, lower-case bold letters represent 
column vectors and upper-case bold ones represent matrices. denotes 
the transpose of the matrix A whose pseudo-inverse is A^^, and I is 
reserved for identity matrix. 

Inductive Sparse Subspace Clustering Algorithm: The basic idea of our 
approach is that: Suppose two data sets YsR™'^'' (in-sample data) 
and X G ^ " (out-of-sample data) are drawn from multiple underlying 
manifolds of which each con'esponds to a subspace. Provided Y is 
sufficient such that the manifolds are well-sampled, we expect to leam an 
embedding space with Y and group X in the embedding space since it is 
more compact and discriminative than the original space (See Fig.^. 

We make SSC feasible to cluster out-of-sample data in "subspace 
clustering, subspace learning and extension" manner. The first two steps 
are offline processes which only involve in-sample data, and the last one 
groups the out-of-sample data in online way. 

To obtain the cluster meinbership of in-sample data Y, iSSC firstly 
constructs a similaiity graph by minimizing the following objective 
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Fig. 1 A key ohsen'ation. (a) some data points sampled from two 2-dimensional 
manifolds (trefoil-knots) which are embedded into a 3-dimensional space; (b) a 
plan view of the sampled data; (c) the embedding of the sampled data. It is easy 
to find that out-of-sample data points could be easily grouped into the correct 
cluster after they were projected into the embedding space. 
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where G is the sparse representation of the data point g E,™ over 
the dictionary Y^ = [yi . . . yi_i yi+i . . . yp], and 5 > is the error 
tolerance. 

After getting the coefficients of Y, iSSC performs normalized spectral 
clustering over the sparse coefficients to get the cluster membership of Y 
as SSC does. However, SSC could not efficiently cope with out-of-sample 
data. Motivated by the assumption in manifold leaming, we aim to group 
out-of-sample data in the embedding space. In this letter, following the 
embedding program of Neighborhood Preserving Embedding algorithin 
(NPE) (3j, we perform subspace leaming to compute the projection matrix 
W via 



mm 
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s.t. W^YY^W = I, 



(2) 



where C G K,'' is the collection of the sparse representation of Y 
produced by {T}, and the constraint term aims at the scale-invariance. 

The solution of (2) is given by the maximum eigenvalue solution to the 
following generalized eigenvector problein: 



W^Y(C - 



C^OY^W = AW^YY^W 



(3) 



Once the optimal W is achieved, iSSC transfonns out-of-sample data 
X in the embedding space via W-^X, and then assigns X to the nearest 
cluster in the space. 

The steps of iSSC can be summarized as follows: 

1 For in-sample data Y, calculate the sparse representation coefficients 
C via solving 



s.t. 



||yi - YiCi||2 < 5. 



2 Construct a Laplacian matrix L = S 2 AS 2 by using the affinity 
matrix A, where S = diagjsi} with Si = o,ij, aij is an entry of 
Aand A = |C| + |C|^. 

3 Obtain the eigenvector matiix V G E,'' ^ * which consists of the first 
k normalized eigenvectors of L corresponding to its k smallest 
eigenvalues. 

4 Get the segmentations of Y by performing k-means clustering 
algoiithm on the rows of V. 

5 Suppose the desired dimensionality of embedding space is d, the 
projection matrix WgR™^'' is given by the eigenvectors with d 
largest eigenvalues of the following eigenvector problem: 



= AZ^ 



where M = C + - C^C and Z = W^Y. 

6 Project out-of-sample data X into the d-dimensional space via W-^X. 

7 Search the nearest neighbor of X from Y in the embedding space, and 
assign X to the cluster that the neighbor belongs to. 

Computational Complexity Analysis: Suppose in-sample data Y G ^ ^ 
drawn froin k subspaces, we need 0{t\mp'^ -\- t2pk^) to perform SSC over 
Y, where ti and t2 are the numbers of iteration of Homotopy optimizer |i2] 
and k-means clustering algorithm, respectively. Moreover, we need 0(p'^) 
to compute the projection matrix W-^. To group out-of-sample datum X G 
j^mxn^ we need 0(dmn) to obtain its d-dimensional representation and 
0{dpn) to search the nearest neighbor of X from Y in the embedding 
space. 
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Putting everything togetlier, tlie computational complexity of iSSC 
is 0{timp^ + t2pk'^ + dpn) owing to d < m and m < p, where m < 
p derives from the conditions of compressive sensing theory. Clearly, 
under the same conditions, iSSC is more efficient than SSC whose time 
complexity is about 0{timn^ + t2nk'^) . 

Baselines and Evaluation Metrics: We presented the experimental results 
of our approach over three real-world data sets, i.e.. Extended Yale 
Database B (ExYaleB) |4|, PendigiQand USPfl ExYaleB contains 2414 
facial images of 38 subjects. We cropped the images from 192 x 168 to 
48 X 42 and extracted 1 14 features by using PCA to retain 98% energy of 
the cropped data. Pendigit and USPS are two handwritten digital data sets 
distributed over 10 classes, where Pendigit contains 10992 samples with 
54 features and USPS consists of 1 1000 samples with 256 dimensionality. 

We compared iSSC with three state-of-the-art inductive clustering 
algorithms, i.e., Nystrom based spectral clustering |5 , Spectral 
Embedding Clustering (SEC) |6| and Approximate Kernel K-means 
(AKK) LZJ- Note that, Nystrom based spectral clustering and SEC 
have two variants, respectively. We denote the variants as Nystrom, 
Nystr6m_0ith, SEC_K and SEC_R. The approximate affinity matrix 
of Nystrom is non-orthogonal, while that of Nystrom-Orth is column- 
orthogonal. SEC_K performs k-means to get the clustering results and 
SEC_R adopts spectral rotation method to obtain the final cluster 
assignment matrix. Furthermore, we also reported the results of k-means 
clustering as a baseline. The MATLAB code of iSSC can be downloaded 
at https://www.dropbox.eom/s/ju6y9qe7w81cdyp/CodeAndData.zip. 

We adopted two widely-used metrics. Accuracy and Normalized Mutual 
Information (NMI), to measure the clustering quality of the tested methods. 
The value of Accuracy or NMI is 1 indicates that the predicted clustering 
membership is totally matching with the ground tmth whereas indicates 
totally mismatch. 

In all experiments, the tuned parameters for the algorithms were applied 
to achieve their best Accuracy. Specifically, iSSC adopted Homotopy 
optimizer |2| to solve -minimization problem. The optimizer needs 
two user-specified parameters, sparsity parameter A and error tolerance 
parameter 5. We found a good value combination by setting A = 
(IQ-^, 10"'', IQ-S) and 5 = (lO'^, 10"^, IQ-^). Moreover, iSSC groups 
out-of-sample data in an low-dimensional space which preserves 98% 
energy of the embedding space learned from in-sample data. For the other 
competing methods, we set the value range for different parameters by 
following the configurations in I5ll6ll7l. 

Results: To examine the effectiveness of the algorithms, we randomly 
selected a half of images (1212) from ExYaleB as in-sample data and 
used the remaining samples as out-of-sample data. In the similar way, we 
formed two data sets by choosing 1000 samples from Pendigit and USPS 
as in-sample data and used the rest as out-of-sample data, respectively. 

Table 1: Performance comparisons in different algorithms over ExYaleB. 



Algoritfims 


Accuracy 


NMI 


Time(s) 


iSSC (le-6, le-3) 


59.69% 


62.77% 


24.88 


Nystrom (12) 


25.72% 


46.57% 


9.33 


Nystr6m_Orth (2) 


21.71% 


41.74% 


58.87 


SEC_K (le+12, 5, 1) 


11.02% 


11.09% 


34.91 


SEC_R(le+9, 4, 1) 


5.97% 


4.31% 


19.96 


AKK (0.4) 


8.00% 


9.01% 


9.94 


k-means 


9.03% 


11.20% 


37.05 



Table 2: Performance comparisons in different algorithms over Pendigit. 



Algorithms 


Accuracy 


NMI 


Time(s) 


iSSC (le-6, 0.1) 


84.94% 


71.17% 


28.03 


Nystrom (0.8) 


76.09%) 


68.10% 


8.54 


Nystrom_Orth (6) 


75.72% 


67.04% 


39.24 


SEC_K(le-6, 4, 1) 


76.91% 


63.84% 


24.51 


SEC_R(le-9, 1, 1) 


11.01% 


1.18% 


18.14 


AKK (0.9) 


77.22% 


69.48% 


9.80 


k-means 


77.05% 


69.21% 


30.21 



Tables [TTSl report the clustering quality and the time costs of the tested 
algorithms over the data sets. In the parenthesis, we also show the tuned 
parameters when the best Accuracy was achieved. From the results, we 
have the following observations: 



' http://archive.ics.uci.edu/ml/datasets.html 
^ http://www.cs.nyu.edu/ roweis/data.html 



Table 3: Performance comparisons in different algorithms over USPS. 



Algorithms 


Accuracy 


NMI 


Time(s) 


iSSC (le-7, 0.01) 


52.93% 


52.90% 


41.52 


Nystrom (14) 


47.66% 


44.42% 


15.91 


Nystr6m_Orth (0.5) 


50.70% 


44.60% 


183.37 


SEC_K(le-9, 3, 1) 


47.63% 


42.28% 


43.38 


SEC_R(le-6, 4, 1) 


11.70% 


1.44% 


19.78 


AKK (0.3) 


48.49% 


46.79% 


16.81 


k-means 


46.54% 


45.61% 


250.82 



• In all the tests, iSSC demonstrates an elegant balance between running 
time and clustering quality. Although iSSC is not the fastest algorithm, 
it outperforms the other tested methods with considerable performance 
margins in Accuracy and NMI. For example, iSSC achieved 33.97% 
gains in Accuracy and 16.20% gains in NMI over the second best 
algorithm when ExYaleB database was used to test. 

• The accelerating kemel-based method (AKK) is inore competitive 
when it was applied to cluster handwiitten digital data but facial data. 
Moreover, AKK performed very close to k-means algorithm, which is 
consistent with the results in |7 1. 

Conclusion: In this letter, we have presented an inductive spectral 
clustering algorithm, called inductive Sparse Subspace Clustering (iSSC). 
The algorithm, which is an out-of-sample extension of Sparse Subspace 
Clustering algorithm (SSC) 1 1 1, scales linearly with the problem size such 
that it could be applied to fast online learning. Experimental results with 
facial image and digital image clustering indicate the effectiveness of iSSC 
comparing with the state-of-the-art approaches. 
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